[00:02.470 --> 00:05.710]  Good afternoon, good evening, or even good morning,
[00:05.710 --> 00:10.170]  depending on which part of the world you are listening to this talk from.
[00:10.410 --> 00:11.770]  And no, it does not get old.
[00:11.770 --> 00:15.730]  I still want to give a huge shout-out to the CryptoVillage organizing community
[00:15.730 --> 00:19.670]  for making this virtual con happen for us.
[00:21.010 --> 00:24.470]  Welcome to the second talk of the day from my side.
[00:25.270 --> 00:27.610]  In this talk, I'm going to touch base upon
[00:27.610 --> 00:31.670]  how to store sensitive information securely
[00:31.670 --> 00:36.190]  so that we can safeguard it reasonably well
[00:36.190 --> 00:39.970]  against any kinds of offline cracking attempts.
[00:40.270 --> 00:43.550]  Before we go ahead, I'd like to introduce myself.
[00:43.550 --> 00:44.990]  I'm Manasi Sheth.
[00:44.990 --> 00:47.190]  I work as a security researcher
[00:47.190 --> 00:51.810]  at a leading static analysis company called Veracode.
[00:52.710 --> 00:56.250]  My primary responsibility here is to be on top
[00:56.250 --> 00:59.770]  of the latest and greatest happenings in the security field,
[00:59.770 --> 01:03.470]  more specifically in the application security domain,
[01:03.470 --> 01:09.330]  and transfer that knowledge to make sure our customers are safe.
[01:10.330 --> 01:12.310]  I'm a huge crypto enthusiast.
[01:12.310 --> 01:17.910]  I have spent a lot of time, or actually a reasonable amount of time,
[01:17.910 --> 01:23.350]  understanding different base crypto blocks.
[01:24.350 --> 01:30.130]  Its practical implementations, how it can be used in the real world.
[01:30.170 --> 01:32.870]  I understand a lot of anti-patterns.
[01:32.870 --> 01:37.230]  I've spent a lot of time looking at different crypto implementations
[01:37.230 --> 01:39.350]  across programming languages,
[01:39.350 --> 01:43.130]  and I try to translate all this expertise
[01:43.130 --> 01:47.010]  into making sure our customers' codebases are safe
[01:47.010 --> 01:50.730]  and to help pick up any anti-patterns.
[01:53.510 --> 01:55.830]  Recently, I got thinking,
[01:55.830 --> 02:00.270]  why are we seeing so many data breaches in the first place?
[02:00.910 --> 02:02.230]  Eventually, I realized,
[02:02.230 --> 02:06.250]  okay, this is going to be a matter of when rather than if.
[02:06.250 --> 02:09.750]  So, what can we do to protect this information
[02:09.750 --> 02:11.570]  once it is already breached,
[02:11.870 --> 02:14.510]  mainly from any kind of offline cracking?
[02:14.510 --> 02:17.690]  How can we make the information cracking process
[02:17.690 --> 02:21.150]  very, very expensive that it is almost useless?
[02:21.150 --> 02:23.110]  So, in that quest,
[02:23.110 --> 02:26.430]  first place I'll definitely go for that matter
[02:26.430 --> 02:28.870]  of which first thing comes to anyone's mind
[02:28.870 --> 02:31.910]  would be Troy Hunt's most organized,
[02:31.910 --> 02:36.870]  most informative site about this called Have I Been Pwned.
[02:38.170 --> 02:42.170]  In this site, he keeps a database of all the breaches
[02:42.170 --> 02:44.930]  which has happened in probably last decade
[02:45.710 --> 02:49.750]  and gives out information about what might have happened,
[02:49.750 --> 02:54.150]  what has happened, and all those kind of neat stuff.
[02:54.610 --> 02:58.630]  So, instead of putting a big slide of shame kind of thing,
[02:58.630 --> 03:01.670]  like who got hacked and what were the reasons and stuff,
[03:01.670 --> 03:06.310]  what I tried to do was look at all the domains which were hacked
[03:06.310 --> 03:09.250]  and what kind of mechanism were they using
[03:09.250 --> 03:12.870]  for storing any kind of sensitive information.
[03:14.270 --> 03:17.950]  So, this is what my analysis says.
[03:17.950 --> 03:20.690]  There were around 453 unique domains,
[03:20.690 --> 03:23.030]  13% were storing it in plain text,
[03:23.030 --> 03:26.190]  20% as hashes without any salting,
[03:26.190 --> 03:28.290]  30% as salted hash,
[03:28.730 --> 03:32.950]  around 15% still using key derivative functions,
[03:32.950 --> 03:34.550]  which is a great step,
[03:34.550 --> 03:38.890]  and roughly 23% decided not to disclose,
[03:38.890 --> 03:42.550]  which in my opinion is plain text,
[03:42.550 --> 03:44.990]  but no judgments here.
[03:46.350 --> 03:49.610]  So, looking at it, I was thinking,
[03:49.610 --> 03:53.010]  okay, why were they breached in the first place?
[03:53.150 --> 03:57.310]  Most of the times it was because they did not pay a lot of attention
[03:57.310 --> 04:02.110]  in other kinds of security issues in their applications.
[04:02.190 --> 04:04.750]  A lot of them were simple SQL injections
[04:04.750 --> 04:08.350]  where entire databases were dumped out.
[04:08.750 --> 04:12.270]  Open S3 buckets was not an uncommon thing I saw.
[04:12.270 --> 04:15.450]  There were a lot of unprotected endpoints,
[04:16.090 --> 04:18.770]  which kind of leaked this data.
[04:20.630 --> 04:22.530]  So, that was my first lesson learned,
[04:22.530 --> 04:24.770]  that why are these things happening?
[04:26.790 --> 04:31.350]  And now, then mainly my focus was towards thinking about,
[04:31.350 --> 04:33.290]  okay, this breach has already happened,
[04:33.290 --> 04:35.590]  or these breaches are always going to keep happening
[04:36.230 --> 04:38.830]  even for innocent reasons sometimes,
[04:38.830 --> 04:40.810]  when there is nothing innocent about security.
[04:42.690 --> 04:45.430]  So, my next thing was, okay, now what can...
[04:45.430 --> 04:48.730]  what happens after these breaches are done?
[04:49.290 --> 04:51.290]  And my first thought was like, oh, wow,
[04:51.290 --> 04:54.390]  the modern computer architecture is getting cheaper and cheaper
[04:54.390 --> 04:58.230]  and it's going to keep getting cheaper from now on.
[04:58.230 --> 05:03.030]  All these modern GPUs with extremely high parallelization capabilities
[05:03.030 --> 05:05.150]  can crack these things in minutes
[05:05.150 --> 05:07.510]  and making the cost much more...
[05:07.510 --> 05:09.830]  making the password cost much cheaper
[05:09.830 --> 05:13.950]  for any kind of offline mechanisms.
[05:14.810 --> 05:21.030]  With the whole Bitcoin mining philosophy of cracking,
[05:21.030 --> 05:23.190]  there are so many ASICs...
[05:23.190 --> 05:28.090]  sorry, application-specific integrated circuits out there,
[05:28.090 --> 05:31.890]  which makes the cost of cracking much more cheaper.
[05:32.970 --> 05:37.070]  There are literally trillions of hashes happening within seconds
[05:37.070 --> 05:40.810]  and this can be easily mapped to any kind of crypto-perimeters.
[05:41.230 --> 05:43.670]  All these things are going to keep making password cracking
[05:43.670 --> 05:47.210]  much more cheaper with time.
[05:48.170 --> 05:49.950]  So, what should we do?
[05:49.950 --> 05:54.430]  Well, the need of the hour was always stretching this information out
[05:54.430 --> 05:55.990]  so that it takes...
[05:55.990 --> 05:58.270]  we have to throw a lot of computational resources
[05:58.790 --> 06:02.330]  like CPU and memory to compute each password.
[06:02.710 --> 06:04.950]  With that, we are going to increase the speed
[06:04.950 --> 06:07.050]  with which the passwords are being calculated.
[06:07.070 --> 06:11.150]  This will greatly increase the offline cracking time
[06:11.930 --> 06:15.510]  and making us resilient towards any kind of
[06:15.510 --> 06:18.210]  modern computer architectures or brute-forcing
[06:18.210 --> 06:22.010]  or any kind of time-memory trade-off attacks
[06:22.010 --> 06:25.830]  or rainbow tables, dictionary attacks,
[06:25.830 --> 06:27.390]  all those kind of things.
[06:28.010 --> 06:30.410]  So, that's what we really needed.
[06:31.030 --> 06:33.470]  And how did we achieve that?
[06:34.470 --> 06:37.950]  Using key derivation functions.
[06:37.950 --> 06:40.710]  Well, this concept of key derivation function
[06:40.710 --> 06:43.450]  isn't specific to password hashing
[06:43.450 --> 06:45.570]  or secret information hashing.
[06:45.570 --> 06:48.590]  In fact, it came into existence for actually
[06:48.590 --> 06:53.370]  creating key material out of low-entropy inputs.
[06:54.310 --> 06:57.310]  But at that time, when everyone were in this
[06:57.310 --> 06:59.660]  cat-and-mouse race,
[06:59.660 --> 07:03.560]  people thought that KDFs are much better suited
[07:03.560 --> 07:06.740]  than just simply storing things as plain hashed
[07:06.740 --> 07:08.680]  or salted hashes.
[07:08.680 --> 07:12.000]  It can be much better safeguarded at that time.
[07:13.500 --> 07:16.040]  And thus, this whole philosophy of using
[07:16.040 --> 07:20.980]  key derivation functions, KDFs, came into picture.
[07:21.160 --> 07:23.740]  So, simply how this works?
[07:24.360 --> 07:26.580]  Well, underlying, there is still an algorithm
[07:26.580 --> 07:29.540]  which is used, which is iterated hundreds
[07:29.540 --> 07:32.120]  and thousands and sometimes a couple of millions
[07:32.120 --> 07:35.740]  of times on a basic crypto-primitive.
[07:36.060 --> 07:39.980]  This type of functions are called adaptive functions.
[07:41.040 --> 07:43.780]  And then much more matured ones are throwing
[07:43.960 --> 07:46.140]  a little or sometimes a lot of memory
[07:46.140 --> 07:48.980]  to this iterative process, which increases
[07:48.980 --> 07:53.880]  or almost quadruples the speed of cracking offline.
[07:53.880 --> 07:56.920]  And still, it takes password and salt,
[07:56.920 --> 07:59.720]  gives out a fixed length hash,
[07:59.720 --> 08:03.200]  and it's the work factor which can be tuned
[08:03.200 --> 08:06.100]  based on different hardware your application
[08:06.100 --> 08:08.160]  is being deployed.
[08:09.540 --> 08:13.260]  So, that's going to be the base of our talk today
[08:13.260 --> 08:16.140]  to protect our safeguard against most
[08:16.140 --> 08:18.100]  of the offline cracking mechanisms.
[08:20.220 --> 08:23.300]  Some design considerations I'd like to point out
[08:23.300 --> 08:27.640]  here is try to save your password hash
[08:27.640 --> 08:30.480]  and salt in completely different databases,
[08:30.480 --> 08:33.800]  or a distributed database, or maybe a database
[08:33.800 --> 08:35.280]  and a property file or something.
[08:35.280 --> 08:37.560]  They should not be close to each other.
[08:39.200 --> 08:41.580]  Obviously, you're not going to store a password
[08:41.580 --> 08:43.420]  in plain text anymore.
[08:44.620 --> 08:48.260]  I can even go as far as saying that maybe
[08:48.260 --> 08:50.840]  we can have different work factors
[08:50.840 --> 08:55.100]  for different information we are trying to safeguard,
[08:55.100 --> 08:57.260]  or even different logins can have different
[08:57.360 --> 09:00.240]  work factors and storing work factor
[09:00.240 --> 09:02.040]  per login, obviously.
[09:04.820 --> 09:09.020]  With increasing cost in memory or CPU,
[09:09.880 --> 09:13.160]  it should be routinely checked that work factors
[09:13.160 --> 09:16.280]  are incremented accordingly to keep making
[09:16.280 --> 09:19.080]  password offline cracking expensive.
[09:20.160 --> 09:25.480]  Lastly, how expensive is tolerable or acceptable?
[09:25.780 --> 09:30.340]  Industry standards are any kind of interactive login,
[09:30.440 --> 09:34.740]  a latency of around one second is very acceptable.
[09:34.740 --> 09:37.240]  So make sure you tune your work factors
[09:37.240 --> 09:40.320]  in a way that the output is calculated
[09:40.320 --> 09:42.200]  around a second.
[09:42.200 --> 09:46.400]  This is acceptable latency, and it still increases
[09:46.400 --> 09:49.720]  the offline cracking time by a huge margin.
[09:50.500 --> 09:53.600]  And lastly, if you are using your password
[09:54.080 --> 09:56.400]  or trying to save a password which is not going
[09:56.400 --> 09:58.680]  to be involved in interactive logins,
[09:58.680 --> 10:01.560]  like, for example, your hard disk encryption,
[10:02.060 --> 10:04.460]  a latency of around five to six seconds
[10:04.460 --> 10:07.760]  is quite acceptable in that scenario.
[10:08.000 --> 10:11.060]  So those are the things I'd like to say
[10:11.060 --> 10:14.700]  about key derivation functions in general.
[10:16.420 --> 10:19.180]  Let's start talking about different
[10:19.180 --> 10:22.380]  key derivation functions in existence,
[10:22.380 --> 10:25.400]  starting with adaptive functions.
[10:25.880 --> 10:29.220]  One of the oldest and the most widely adopted
[10:29.220 --> 10:31.860]  function was pbkdf2.
[10:31.860 --> 10:34.100]  Again, it came into picture because
[10:34.100 --> 10:38.540]  they really needed to generate keying materials.
[10:39.080 --> 10:43.180]  This function is the only government-approved
[10:43.180 --> 10:45.140]  function right now, so if you really have to
[10:45.140 --> 10:49.300]  comply by government standards, I really wish you don't,
[10:49.300 --> 10:53.240]  then this is probably your only option, unfortunately.
[10:53.240 --> 10:55.860]  This function is also used as
[10:55.860 --> 10:58.920]  different crypto-primitive blocks for
[10:58.920 --> 11:01.860]  other modern functions
[11:01.860 --> 11:04.200]  these days.
[11:05.640 --> 11:08.360]  Let's see how this actually works internally.
[11:08.840 --> 11:11.020]  So just like our generic KDF
[11:11.020 --> 11:14.920]  working, it still takes a password, a salt,
[11:14.920 --> 11:18.140]  gives out a password hash. You can actually configure the
[11:18.140 --> 11:20.960]  size you expect out of a password hash. This feature
[11:20.960 --> 11:23.900]  was more for... because there is always
[11:23.900 --> 11:26.620]  requirement of a fixed key size for any kind of
[11:26.620 --> 11:28.920]  block cipher being used for.
[11:29.780 --> 11:33.660]  The work factor is in terms of iteration count.
[11:36.620 --> 11:40.200]  Let's talk a little bit about the internal working of this algorithm.
[11:40.620 --> 11:42.540]  What it does is it runs the
[11:42.540 --> 11:45.120]  crypto-primitive it iterates over is
[11:45.960 --> 11:48.800]  a pseudo-random function, usually a HMAC.
[11:49.900 --> 11:52.320]  Based on the desired output length
[11:52.320 --> 11:55.220]  and the block size of the internal hash being used
[11:55.220 --> 11:57.720]  in the HMAC, different blocks are
[11:57.720 --> 12:00.720]  generated and those blocks are iterated for the iteration
[12:00.720 --> 12:03.840]  number of count times. Output is concatenated
[12:03.840 --> 12:06.180]  and that's your password hash.
[12:07.840 --> 12:10.080]  So you will see things a little bit
[12:10.080 --> 12:13.300]  in green color here. What I have done is
[12:13.300 --> 12:16.320]  I have written a tool which will do
[12:16.320 --> 12:18.940]  parameter tuning for me based on the
[12:18.940 --> 12:22.420]  rules of thumb I mentioned earlier about having
[12:22.680 --> 12:25.040]  a password being calculated in roughly 1 second
[12:25.040 --> 12:28.180]  for interactive logins and roughly 5 seconds
[12:28.180 --> 12:31.820]  for any kind of non-interactive logins.
[12:31.880 --> 12:34.260]  So since PBKDF is government
[12:34.260 --> 12:37.340]  approved, they say the iteration count or the work factor for
[12:37.340 --> 12:40.480]  this algorithm should be around 10,000, which is
[12:40.480 --> 12:44.340]  way lower. Please don't do that. Please increase something.
[12:44.380 --> 12:46.500]  For a reasonable hardware
[12:46.500 --> 12:49.500]  on which most of the typical web applications would
[12:49.500 --> 12:51.660]  be deployed in today's time
[12:53.180 --> 12:54.700]  EC2T2 instance with
[12:55.340 --> 12:58.080]  roughly 8 GB of RAM and
[12:59.020 --> 13:01.640]  x86 architecture. I ran
[13:01.640 --> 13:04.160]  this tool and the number of iterations
[13:04.160 --> 13:07.080]  were around 1.5 million for
[13:07.080 --> 13:10.400]  just 1 second of password calculation. Just imagine
[13:10.400 --> 13:13.580]  how off the government standards are here.
[13:13.660 --> 13:16.280]  So I'll highly motivate whichever algorithm
[13:16.280 --> 13:19.420]  you decide to use among all the algorithms you are going to talk
[13:19.420 --> 13:22.100]  about. Please run some kind of a tuning
[13:22.100 --> 13:25.040]  utility. Play around with your parameters.
[13:25.040 --> 13:27.660]  I'll be open sourcing this tool anyways. You can
[13:27.660 --> 13:30.980]  feel free to grab it and run it on your
[13:30.980 --> 13:33.600]  deployment hardware to tune your work factors
[13:33.600 --> 13:34.840]  accordingly.
[13:37.200 --> 13:39.800]  Some things you should be worried about when
[13:39.800 --> 13:43.380]  choosing this algorithm is please choose your
[13:43.380 --> 13:45.320]  password output length
[13:45.980 --> 13:49.160]  little less than or equal to the internal hash you are
[13:49.160 --> 13:52.400]  using. The reason being it unnecessarily
[13:52.400 --> 13:55.060]  takes a lot of processing power
[13:55.060 --> 13:58.200]  for a no value add. So that's
[13:58.200 --> 13:59.980]  one of my suggestions.
[14:00.200 --> 14:03.600]  In this algorithm, you can just
[14:04.520 --> 14:07.400]  configure the CPU time involved. There is no
[14:07.400 --> 14:10.340]  memory involved. It is still not at all resilient
[14:10.340 --> 14:13.360]  towards any kinds of brute forcing attempts because
[14:13.360 --> 14:16.160]  of the highly parallelized nature of very
[14:16.160 --> 14:19.340]  cheaply rentable GPUs in today's
[14:19.340 --> 14:21.940]  time. If you don't have to comply
[14:21.940 --> 14:25.240]  with the government standards, please move on.
[14:29.380 --> 14:32.480]  Next, a notable mention, Bcrypt
[14:32.480 --> 14:36.900]  is very commonly used.
[14:36.900 --> 14:39.080]  It is based on a
[14:39.080 --> 14:42.480]  already deprecated symmetric
[14:42.480 --> 14:46.800]  cipher called Blowfish.
[14:46.960 --> 14:48.760]  It involves a little bit of
[14:48.760 --> 14:51.480]  memory for its internal working. So that's what is
[14:51.480 --> 14:54.080]  slightly better than pbkdf. But again, that memory
[14:54.080 --> 14:55.220]  or amount of memory it will need to
[14:55.220 --> 14:59.820]  use is not tunable by the user.
[15:00.000 --> 15:02.000]  And again, it was designed for
[15:02.000 --> 15:04.980]  generating key materials and not with storing
[15:04.980 --> 15:08.400]  secrets or password hashing in mind.
[15:08.420 --> 15:10.840]  So how does this algorithm work internally?
[15:10.840 --> 15:14.200]  Again, we have a password, a fixed size salt,
[15:14.240 --> 15:17.620]  a password output as a password hash.
[15:18.800 --> 15:19.900]  Iteration count
[15:19.900 --> 15:22.240]  is specified in logarithmic
[15:23.060 --> 15:26.200]  way. So this is going to be 2 raised to 14 number of
[15:26.200 --> 15:29.360]  iterations. And internally, how it works is it has
[15:29.360 --> 15:30.760]  this very expensive
[15:33.980 --> 15:36.240]  BlowCypher key setup process
[15:37.080 --> 15:38.380]  which involves some
[15:38.380 --> 15:41.300]  memory. So it iterates around that key setup
[15:41.300 --> 15:44.400]  and the output is again iterated
[15:44.400 --> 15:46.860]  through a normal BlowCypher
[15:47.940 --> 15:50.300]  Blowfish algorithm and the
[15:50.300 --> 15:53.220]  output is given to the
[15:53.220 --> 15:54.260]  caller.
[15:56.760 --> 15:59.560]  It still is a little better than pbkdf
[15:59.560 --> 16:02.280]  because of the internal RAM usage, but it is
[16:03.340 --> 16:05.520]  still very susceptible to brute-forcing
[16:05.520 --> 16:08.360]  attacks, maybe slightly more expensive than the
[16:08.360 --> 16:11.720]  previous one. And I don't understand
[16:11.720 --> 16:14.580]  why to use bcrypt, though I've seen a lot of usages
[16:14.580 --> 16:17.620]  of that. If you don't have to even comply
[16:17.620 --> 16:20.740]  by the government standards, why not use the more
[16:22.420 --> 16:24.180]  modern memory-hard
[16:24.180 --> 16:27.500]  functions. Okay, let's start talking about
[16:28.200 --> 16:31.100]  a few different memory-hard functions.
[16:31.640 --> 16:33.280]  The first one being
[16:33.280 --> 16:36.080]  sCrypt. It's one of the very
[16:36.080 --> 16:38.620]  earlier generation memory-hardness
[16:39.180 --> 16:41.260]  built inside the function.
[16:42.260 --> 16:44.980]  It is having an increased adaptation in
[16:44.980 --> 16:48.400]  cryptocurrencies mainly due to the
[16:48.400 --> 16:51.360]  nature of it. In cryptocurrencies, they don't really need
[16:51.360 --> 16:53.980]  to worry about time-memory
[16:53.980 --> 16:56.960]  trade-off attacks like offline cracking needs to
[16:56.960 --> 17:00.200]  and cryptocurrencies need to be
[17:00.200 --> 17:03.200]  more worried about side-channel attacks, which is not the domain
[17:03.200 --> 17:05.800]  for offline cracking.
[17:05.980 --> 17:09.060]  So it's mainly due to the nature of the applications
[17:09.060 --> 17:11.900]  it's widely adopted in cryptocurrencies. It's not
[17:11.900 --> 17:14.920]  like it's better or less
[17:14.920 --> 17:18.100]  secure for other applications.
[17:19.220 --> 17:20.660]  It was still designed
[17:20.660 --> 17:24.120]  for keying material, but it saw a lot
[17:24.120 --> 17:26.540]  of promising breakthroughs
[17:26.540 --> 17:30.160]  in being used for offline cracking.
[17:31.560 --> 17:33.160]  Okay, let's see how this
[17:33.160 --> 17:35.960]  works. It still has password salt
[17:35.960 --> 17:39.220]  output password we are given. We can still configure
[17:39.220 --> 17:42.180]  the length we desire out of the output.
[17:43.120 --> 17:45.400]  And if you see, the work factor has
[17:45.400 --> 17:49.340]  increased from 1 to almost 3 at this point.
[17:49.560 --> 17:51.500]  And not all are going to be
[17:52.340 --> 17:54.320]  still giving us all the freedom
[17:54.320 --> 17:57.140]  to tune all kinds of resources, but that's a huge step
[17:57.140 --> 18:00.760]  in the right direction, in my opinion, with memory-hardness involved.
[18:01.760 --> 18:03.240]  There is parallelization. You can
[18:03.240 --> 18:06.440]  parallelize the computation. I would like to note
[18:06.440 --> 18:09.280]  here that not all implementations
[18:09.280 --> 18:12.620]  give this control to the user. They still
[18:12.620 --> 18:15.480]  do it the way it depends
[18:15.480 --> 18:16.900]  on the implementation.
[18:18.480 --> 18:21.860]  You would note that there is only one parameter
[18:21.860 --> 18:24.400]  which will control both the CPU resources
[18:24.400 --> 18:27.540]  as well as the memory involved. Basically,
[18:27.540 --> 18:29.400]  it does not differentiate between
[18:30.210 --> 18:33.420]  both the resources we can throw at this algorithm.
[18:33.420 --> 18:36.500]  So basically, going down the line, if we decide
[18:36.500 --> 18:39.840]  that memory is getting cheaper, so we should increase the amount of memory
[18:39.840 --> 18:43.540]  being used in the algorithm, we don't really have a choice here.
[18:43.840 --> 18:45.800]  And finally is the block size which is used
[18:45.800 --> 18:48.620]  internally. Most typical values are 8 or
[18:48.620 --> 18:50.940]  16. It does not make a huge difference.
[18:53.300 --> 18:54.720]  Let's talk a little bit about
[18:54.720 --> 18:56.820]  how the algorithm works internally.
[18:57.020 --> 19:01.620]  While talking about PBKDF,
[19:01.620 --> 19:04.860]  it is used internally from Mac to generate a
[19:04.860 --> 19:07.980]  fixed size password which is looped through this
[19:07.980 --> 19:10.680]  crazy memory array with a lot of
[19:10.680 --> 19:13.580]  stream ciphers and XORing going on for
[19:13.580 --> 19:17.400]  the iteration number of count.
[19:17.400 --> 19:20.040]  And again, the output is made of fixed size
[19:20.040 --> 19:22.480]  by PBKDFing it again and
[19:22.480 --> 19:27.460]  the output is passed to the caller.
[19:27.460 --> 19:31.600]  If you are choosing this,
[19:33.140 --> 19:35.700]  any kind of brute forcing
[19:35.700 --> 19:38.600]  attempts by a huge margin compared to
[19:38.600 --> 19:40.520]  adaptive functions.
[19:40.820 --> 19:44.000]  But the way the memory is being used
[19:44.000 --> 19:46.660]  by the algorithm internally, it is still using
[19:46.660 --> 19:49.800]  adjacent memory arrays in the consecutive
[19:49.800 --> 19:53.160]  operations. What I am trying to say is
[19:53.160 --> 19:56.780]  depending on the value of the password,
[19:56.780 --> 19:59.380]  the consecutive arrays are chosen.
[19:59.380 --> 20:02.180]  So this is always going to be a predefined
[20:03.020 --> 20:05.520]  sequence of memory arrays based on
[20:05.520 --> 20:08.660]  the input password. And what this opens
[20:08.660 --> 20:11.200]  us is towards the side channel
[20:12.360 --> 20:13.640]  attacks.
[20:15.380 --> 20:17.600]  Again, as we spoke about earlier,
[20:17.600 --> 20:20.140]  there is no way we can tune the CPU and memories
[20:20.140 --> 20:23.320]  independently. And since there is a lot of
[20:23.320 --> 20:26.460]  crypto involved in this algorithm,
[20:26.460 --> 20:29.480]  we have HMAC, we have PBKDF, we have
[20:29.480 --> 20:32.380]  Salsa, StreamCyphers, we have
[20:32.380 --> 20:35.920]  Xoring, we again have PBKDF.
[20:35.920 --> 20:38.120]  The number of crypto involved
[20:38.120 --> 20:41.440]  increases, the implementations become more
[20:41.440 --> 20:44.440]  complicated, more error prone, the crypto
[20:44.440 --> 20:46.840]  analysis becomes more complicated and
[20:47.300 --> 20:50.620]  it's not as sneak, basically.
[20:51.360 --> 20:53.760]  So these are the things
[20:53.760 --> 20:57.060]  you should think about if you decide to go with a script.
[20:57.060 --> 21:00.320]  Still a huge step ahead of the adaptive functions.
[21:00.320 --> 21:03.120]  It still reduces the offline cracking cost
[21:03.120 --> 21:05.940]  by a huge margin, almost like a quadruple
[21:05.940 --> 21:08.060]  based on the number of iterations.
[21:08.960 --> 21:11.780]  It's a great choice. Well, don't they say
[21:11.780 --> 21:14.720]  save the best algorithm for the last? And that's
[21:14.720 --> 21:16.260]  what Argon2 is.
[21:17.780 --> 21:20.860]  Around 2015, there was this competition which
[21:20.860 --> 21:24.500]  took place and the goal of that competition
[21:24.500 --> 21:27.400]  was to come out with an algorithm which is
[21:27.400 --> 21:30.100]  specifically suited for storing secrets
[21:31.620 --> 21:33.500]  and safeguard it against
[21:33.500 --> 21:36.060]  any kind of offline cracking, obviously.
[21:36.780 --> 21:38.880]  It wasn't like before that
[21:38.880 --> 21:41.960]  no one was thinking about how to safely store
[21:41.960 --> 21:45.580]  information. Well, the breaches were happening.
[21:45.580 --> 21:47.980]  It was more like a cat and mouse race
[21:50.280 --> 21:52.760]  and the industry started looking at
[21:52.760 --> 21:55.720]  what current tools are there in the
[21:55.720 --> 21:58.660]  cryptography arsenal which we can apply to this
[21:58.660 --> 22:01.640]  particular problem quickly so that
[22:01.640 --> 22:04.440]  they can safeguard their information
[22:04.440 --> 22:06.940]  as long as they can from any kind of
[22:06.940 --> 22:09.320]  offline cracking mechanism.
[22:11.140 --> 22:13.660]  Well, Argon2, the winner of this algorithm
[22:13.660 --> 22:16.700]  is obviously going to take care of all those existing
[22:16.700 --> 22:19.700]  attacks we have been talking about so far.
[22:19.700 --> 22:21.240]  It's very resilient against
[22:23.120 --> 22:25.920]  any brute forcing or dictionary attack.
[22:25.920 --> 22:28.300]  It's very hard to just parallelize it and
[22:28.300 --> 22:31.760]  run computations and start cracking passwords in
[22:31.760 --> 22:34.860]  minutes. They were also very resilient
[22:34.860 --> 22:37.460]  about the modern computer architecture
[22:37.460 --> 22:40.720]  which started maturing, became much cheaper and
[22:40.720 --> 22:43.360]  they were very, very cognizant about it will keep getting
[22:43.360 --> 22:46.320]  cheaper with time. So it's obviously going to be
[22:46.320 --> 22:49.540]  very resilient against any application
[22:49.540 --> 22:52.380]  specific integrated circuit architectures or even
[22:52.380 --> 22:54.160]  FPGA arrays.
[22:55.140 --> 22:58.360]  A few years ago, there were limited
[22:58.360 --> 23:01.860]  implementations of this algorithm out of a
[23:01.860 --> 23:05.000]  direct cryptographic library. But that situation
[23:05.000 --> 23:07.880]  has greatly changed. So congratulations, you don't have to
[23:07.880 --> 23:10.620]  implement this algorithm yourself. You can pick up any
[23:10.620 --> 23:13.500]  programming language, any library in that language
[23:13.500 --> 23:15.940]  and for the most
[23:17.500 --> 23:19.880]  amount of time, you would be having
[23:19.880 --> 23:23.000]  this implementation ready to be plugged and played
[23:23.000 --> 23:25.500]  in your application.
[23:26.880 --> 23:29.160]  So let's see how this algorithm works.
[23:29.160 --> 23:31.760]  Well, you still have your passwords and salts. You still
[23:31.760 --> 23:35.000]  will get a fixed length hashed password.
[23:35.000 --> 23:37.580]  In terms of memory factors, you have
[23:37.580 --> 23:40.820]  three parameters now. One is obviously
[23:40.820 --> 23:43.860]  parallelization. You can configure it based on number of
[23:43.860 --> 23:46.420]  CPU cores you have at your disposal
[23:46.420 --> 23:49.920]  and you can tune the amount of CPU
[23:49.920 --> 23:52.820]  needed in terms of number of iterations and
[23:53.120 --> 23:56.280]  the memory which can be used in this algorithm
[23:56.280 --> 23:58.940]  in terms of memory size.
[23:58.940 --> 24:01.760]  So it basically decouples both the resources
[24:01.760 --> 24:04.200]  resource utilizations.
[24:04.200 --> 24:07.240]  Very unlike sCrypt and one of the huge
[24:07.240 --> 24:08.840]  advantages.
[24:09.940 --> 24:12.840]  There is some crypto analysis done on
[24:12.840 --> 24:16.160]  Argon2 algorithms which makes
[24:16.160 --> 24:18.980]  any iterations below 10
[24:19.800 --> 24:21.840]  a little susceptible but it is more
[24:21.840 --> 24:24.900]  theoretical crypto analysis so don't freak out but
[24:24.900 --> 24:28.180]  choose a parameter which is at least greater than 10 and
[24:28.180 --> 24:30.180]  that's the reason.
[24:31.960 --> 24:34.420]  Talking about modes of operation
[24:34.420 --> 24:37.540]  which is crucial. There are two
[24:37.540 --> 24:40.900]  main types of modes in this algorithm.
[24:40.900 --> 24:43.600]  One is data-dependent mode which is what
[24:43.600 --> 24:46.600]  was there in the previous sCrypt algorithm as well
[24:46.600 --> 24:49.600]  and a data-independent mode which is the
[24:49.600 --> 24:52.500]  best option for password storage and the third
[24:52.500 --> 24:55.240]  one is a hybrid of both the modes working
[24:55.240 --> 24:56.340]  together.
[24:58.240 --> 25:00.940]  How these modes work for that let's start
[25:00.940 --> 25:03.600]  talking about the internal crypto
[25:04.500 --> 25:07.700]  of this algorithm. So how it goes is
[25:07.700 --> 25:11.120]  it first computes a
[25:11.120 --> 25:14.340]  hash of password salt and all these different
[25:14.340 --> 25:16.960]  parameters. Any hash can be used
[25:16.960 --> 25:19.900]  usually it is Blake2.
[25:19.900 --> 25:23.580]  And based on the value of the hash
[25:23.580 --> 25:25.420]  sorry before that there is this memory
[25:25.420 --> 25:28.960]  array which is dedicated to this based on the
[25:28.960 --> 25:31.760]  memory size we give. So just imagine it as rows
[25:31.760 --> 25:34.980]  and columns being populated iteratively for a
[25:34.980 --> 25:38.840]  number of iterations of times.
[25:39.520 --> 25:41.640]  So for data-dependent
[25:41.640 --> 25:44.580]  mode and data-independent mode how it is different?
[25:44.580 --> 25:48.080]  It's like each memory array is being
[25:48.080 --> 25:50.440]  populated in sequence but that sequence
[25:50.440 --> 25:53.080]  in data-dependent mode is decided by the value
[25:53.080 --> 25:56.240]  of the hash which is dependent on the value of the
[25:56.240 --> 25:59.220]  input password. And for data-independent mode it is
[25:59.220 --> 26:02.360]  completely random. So basically
[26:02.360 --> 26:05.580]  what it comes down to is the sequence of memory
[26:05.580 --> 26:08.320]  array populations per iteration is actually
[26:08.320 --> 26:11.120]  based on the input password. And
[26:11.120 --> 26:14.420]  this particular feature of this algorithm makes it
[26:14.420 --> 26:16.920]  susceptible to side channel
[26:16.920 --> 26:20.420]  attacks which is okay for
[26:21.480 --> 26:24.720]  cryptocurrencies but not very comfortable
[26:24.720 --> 26:28.240]  for any kind of password storage.
[26:28.240 --> 26:30.300]  And that's why you should use the
[26:30.300 --> 26:33.640]  independent mode where the sequence of memory
[26:33.640 --> 26:36.460]  array population is completely random and it takes away
[26:36.460 --> 26:39.420]  even that issue which a script had
[26:39.420 --> 26:42.680]  and which even this kind of attacks
[26:42.680 --> 26:45.640]  might have. So some
[26:45.640 --> 26:48.640]  design considerations. You have to tune
[26:48.640 --> 26:51.920]  your parameters a little more carefully and mostly
[26:51.920 --> 26:54.680]  because you have few more parameters to
[26:54.680 --> 26:58.040]  take care of. As we talked about
[26:58.040 --> 27:01.160]  the data-independent mode is more susceptible to
[27:01.160 --> 27:03.640]  any kind of side channel attacks and
[27:04.240 --> 27:06.660]  there is this hybrid mode where
[27:06.660 --> 27:10.080]  the first part of it is happening in independent manner
[27:10.080 --> 27:13.420]  and the next half is happening in the data-dependent manner.
[27:13.480 --> 27:15.800]  And with that we get the best of both worlds.
[27:15.800 --> 27:19.340]  We are resilient towards side channel attacks as well as
[27:19.340 --> 27:22.920]  we are much better with the time memory trade-off issues.
[27:23.580 --> 27:25.160]  So that's what
[27:25.160 --> 27:27.380]  I would say. Use Argon2
[27:27.380 --> 27:30.980]  mainly in 2ID mode. We already
[27:30.980 --> 27:34.120]  looked at what is the best parameter options for
[27:34.120 --> 27:37.100]  an output for password computation
[27:37.100 --> 27:38.900]  within a second.
[27:40.840 --> 27:43.200]  Since Argon2 is the algorithm
[27:43.200 --> 27:46.020]  I am highly recommending to be used for
[27:46.020 --> 27:49.920]  any kind of sensitive information storage.
[27:50.220 --> 27:52.280]  I'd like to quickly show how
[27:52.280 --> 27:54.800]  easy it is to just start using this
[27:54.800 --> 27:58.880]  by any implementations you have access to.
[28:00.000 --> 28:01.400]  Just to record
[28:01.400 --> 28:04.520]  this demo, I had a forked off EC2 instance.
[28:04.520 --> 28:07.100]  These are the details of the instance. It's a typical
[28:07.100 --> 28:10.060]  standard T2 medium with 2
[28:10.060 --> 28:13.520]  CPUs and 4GB of RAM.
[28:13.520 --> 28:16.160]  Currently available as of this moment is
[28:16.160 --> 28:19.220]  around 2.8GB. It's using x86
[28:19.220 --> 28:22.400]  architecture on a Linux kernel.
[28:23.160 --> 28:25.100]  So let's see how quick
[28:25.100 --> 28:28.400]  it is to start using this.
[28:29.580 --> 28:31.220]  Well, I am going to
[28:31.220 --> 28:33.540]  use NACL's password
[28:34.620 --> 28:37.340]  module where the Argon2 ID is
[28:37.340 --> 28:40.040]  being supported. Well, a quick pro tip
[28:40.040 --> 28:42.820]  whenever given a choice of any
[28:43.170 --> 28:45.900]  crypto implementation you need
[28:45.900 --> 28:48.940]  if you have a choice, always go for NACL. It's
[28:48.940 --> 28:52.280]  very cleanly written by cryptographers rather than
[28:52.280 --> 28:55.580]  actual developers. They deprecate
[28:55.580 --> 28:58.300]  things which are no more secure or better
[28:58.300 --> 29:01.360]  options are available out there. They quickly deprecate all those
[29:01.360 --> 29:04.060]  things from anyone's view. So chances of
[29:04.060 --> 29:07.280]  making wrong choices are eliminated.
[29:08.440 --> 29:10.040]  Okay, coming back to the
[29:10.040 --> 29:13.280]  code, we are going to use that module and just
[29:13.280 --> 29:16.160]  to keep track of time, we are going to use the time module.
[29:16.280 --> 29:18.300]  Start time is going to be
[29:18.980 --> 29:21.620]  current time. Let's start
[29:21.620 --> 29:23.940]  storing the password.
[29:27.360 --> 29:29.380]  First, let's use the
[29:30.340 --> 29:30.900]  module
[29:33.980 --> 29:35.260]  hash.argon2id
[29:35.260 --> 29:38.060]  Why not use the hybrid one? Best of both
[29:38.060 --> 29:41.120]  worlds, right? And the string
[29:41.120 --> 29:43.960]  output for that. Giving the
[29:43.960 --> 29:45.040]  password
[29:49.450 --> 29:52.630]  long is better than higher, entropy
[29:52.630 --> 29:55.810]  shorter one. So that's what I choose for.
[29:56.350 --> 29:58.730]  Ops limit is the number of iterations
[29:58.730 --> 30:01.530]  around the memory array. Let's go for 14
[30:02.770 --> 30:04.210]  till 10 is already
[30:05.190 --> 30:07.830]  theoretically crypto-analyzed. So anything
[30:07.830 --> 30:09.810]  about 10 is great.
[30:11.670 --> 30:13.950]  And the key thing, memory.
[30:14.530 --> 30:16.910]  It's again logarithmically given.
[30:17.330 --> 30:20.530]  We already know the answer from our tuning tool.
[30:20.750 --> 30:22.530]  So let's see. This is
[30:23.090 --> 30:26.190]  as short and sweet as it can get actually.
[30:26.570 --> 30:29.330]  And just to keep track of time,
[30:29.330 --> 30:31.630]  let's see how much it is.
[30:32.350 --> 30:35.750]  And hopefully this compiles and runs actually.
[30:36.850 --> 30:38.730]  Okay, so it took
[30:38.730 --> 30:40.990]  around 1.1 second.
[30:41.990 --> 30:45.510]  Can we go any higher? Let's see.
[30:45.550 --> 30:47.630]  Going from 27 to 28
[30:48.970 --> 30:50.170]  which is
[30:52.450 --> 30:53.750]  2.3
[30:53.750 --> 30:56.910]  I'll leave that choice to you at this point.
[30:56.910 --> 31:00.870]  This is how we should typically tune any parameters.
[31:06.210 --> 31:06.810]  It's almost
[31:06.810 --> 31:09.950]  second nature to wonder how these functions actually
[31:09.950 --> 31:12.050]  compare against each other.
[31:12.610 --> 31:15.850]  There is not a lot of research done
[31:15.850 --> 31:19.110]  in this topic about
[31:19.110 --> 31:22.230]  how to actually put a dollar value
[31:22.230 --> 31:24.650]  or number of years of cracking a particular
[31:24.650 --> 31:27.090]  breach mechanism or
[31:27.790 --> 31:31.170]  There is no apples to oranges comparison as well
[31:31.170 --> 31:32.990]  you could imagine.
[31:33.210 --> 31:36.250]  But good notable work is being done by
[31:36.250 --> 31:39.950]  these two papers. One released in USENIX
[31:40.470 --> 31:42.950]  a couple of years ago and another one done
[31:42.950 --> 31:45.690]  in conjunction between Microsoft and
[31:45.690 --> 31:47.510]  Purdue University.
[31:48.030 --> 31:50.270]  In the most trivial way, what they do is
[31:50.270 --> 31:54.050]  they look at the latest hardware and the memory cost
[31:54.050 --> 31:57.490]  and just start calculations
[31:57.490 --> 31:58.970]  from there.
[31:59.930 --> 32:03.320]  And that's what I tried to do it as of yesterday.
[32:04.370 --> 32:06.250]  So for adaptive functions
[32:06.250 --> 32:07.670]  it's just going to be again
[32:08.750 --> 32:11.990]  a huge list of disclaimers here before I actually
[32:11.990 --> 32:14.790]  talk about this very very cautiously actually.
[32:16.250 --> 32:18.550]  I'm assuming the keyword, the password
[32:18.550 --> 32:19.990]  the sort, the output
[32:19.990 --> 32:23.130]  all those things are same across
[32:23.130 --> 32:26.230]  all these functions. I'm also assuming there is
[32:26.230 --> 32:28.430]  no electricity cost involved.
[32:31.590 --> 32:33.030]  Talking about
[32:33.030 --> 32:35.110]  these figures for adaptive functions
[32:35.110 --> 32:37.890]  it's just going to be as simple as number of iterations per
[32:37.890 --> 32:40.910]  cost of hardware. And the cost of
[32:40.910 --> 32:44.430]  hardware is much easier to calculate
[32:44.430 --> 32:47.370]  these days because most of the modern A6
[32:47.370 --> 32:51.210]  hardwares for Bitcoin mining come with that statistics.
[32:51.210 --> 32:53.670]  So looking at one of the most leading
[32:53.670 --> 32:56.510]  hardware in that department coming from
[32:56.510 --> 32:57.530]  Antminer
[32:58.970 --> 33:02.630]  one of the best configured one
[33:02.630 --> 33:05.650]  is around $2500 and it
[33:05.650 --> 33:08.370]  promises to do 110 trillion
[33:08.370 --> 33:11.690]  hashes per second. And trillion is 10 raised to
[33:11.690 --> 33:14.310]  12 if you are wondering as I was.
[33:15.470 --> 33:17.170]  So using that
[33:17.170 --> 33:19.730]  configuration with the number of iterations
[33:20.190 --> 33:23.510]  adaptive function is going to take so much time.
[33:23.510 --> 33:26.570]  The point being it's going to be extremely cheap
[33:26.570 --> 33:28.630]  to just crack these passwords
[33:29.310 --> 33:32.450]  considering this decently priced machines
[33:32.450 --> 33:36.050]  are going to be in common people's hand very soon.
[33:36.890 --> 33:38.310]  And similarly for memory
[33:38.310 --> 33:41.290]  hard functions and I want to stress here that this is only
[33:41.290 --> 33:44.370]  for memory hard functions which is running in the data
[33:44.370 --> 33:47.610]  dependent mode where the array used
[33:47.610 --> 33:50.290]  for memory calculations
[33:51.010 --> 33:53.710]  is based on the input passwords.
[33:54.070 --> 33:56.430]  So where again the cost is going to be
[33:56.430 --> 33:59.490]  more based on memory as well
[33:59.490 --> 34:02.010]  as the amount of time it takes which
[34:02.010 --> 34:04.630]  pretty much quadruples in memory hardness
[34:05.490 --> 34:08.650]  is going to be much more expensive compared
[34:08.650 --> 34:11.770]  to adaptive functions. And this is just for
[34:11.770 --> 34:14.770]  data dependent mode. For data independent mode
[34:14.770 --> 34:17.950]  well the cost might still be the same but the number
[34:17.950 --> 34:20.370]  of guesses is going to be exponential.
[34:20.710 --> 34:23.590]  So this is just sharing
[34:23.590 --> 34:26.750]  some statistics still in a more conservative
[34:26.750 --> 34:27.830]  tone.
[34:29.850 --> 34:32.950]  This is all what I wanted to speak about today
[34:32.950 --> 34:35.730]  around offline cracking
[34:35.730 --> 34:38.790]  different mechanisms we can use to
[34:38.790 --> 34:42.030]  safeguard ourselves. Key derivation function
[34:42.030 --> 34:45.230]  being the key of that. How to tune
[34:45.230 --> 34:46.250]  different parameters
[34:47.910 --> 34:51.210]  and what kind of design concentrations you should be
[34:51.210 --> 34:53.970]  doing while picking each one of that.
[34:55.190 --> 34:57.110]  Next I'd like to talk about
[34:57.250 --> 34:59.910]  a little bit about how all this information
[34:59.910 --> 35:02.850]  can be mapped to storing any kind of secrets you
[35:02.850 --> 35:06.790]  need to. For that you need to sit down
[35:06.790 --> 35:09.730]  and do a little bit of threat modeling for your
[35:09.730 --> 35:12.290]  own usages. Something like
[35:12.290 --> 35:15.650]  what is sensitive to your business?
[35:15.650 --> 35:17.850]  Do you need to comply with any GDPR
[35:18.410 --> 35:21.310]  requirements? Are you storing any personally
[35:21.310 --> 35:23.310]  identified information?
[35:24.130 --> 35:27.530]  Can it be used for crafting further attacks against
[35:27.530 --> 35:29.490]  the users whose
[35:29.490 --> 35:32.710]  information might be breached?
[35:34.070 --> 35:36.270]  How are you storing that information?
[35:36.270 --> 35:39.730]  Are you storing it in a database? Which fields are involved in that database?
[35:39.730 --> 35:41.350]  All those things need to be
[35:41.350 --> 35:44.790]  thought about and you can easily map
[35:44.790 --> 35:47.970]  all these KDFs to work for your own
[35:47.970 --> 35:49.470]  needs.
[35:50.550 --> 35:53.010]  Lastly, you must have come across
[35:53.010 --> 35:56.690]  countless suggestions and
[35:56.690 --> 35:59.710]  password hygiene requirements or
[35:59.710 --> 36:03.590]  tips over years since it is such a
[36:03.590 --> 36:06.970]  key aspect of any kind of authentication
[36:06.970 --> 36:08.970]  mechanism.
[36:09.270 --> 36:12.230]  Just for the completeness of this talk, I'd like
[36:12.230 --> 36:15.070]  to say a few things about it. Always
[36:15.210 --> 36:18.230]  choose a unique password. Password managers
[36:18.230 --> 36:21.450]  are great. Please use those.
[36:21.450 --> 36:23.570]  Longer passwords are better than
[36:23.570 --> 36:26.250]  shorter, higher entropy ones.
[36:27.370 --> 36:30.390]  This is what I typically do. I store
[36:30.390 --> 36:33.490]  my passwords in a password manager.
[36:33.490 --> 36:36.230]  This is the configuration I use while generating any new
[36:36.230 --> 36:39.550]  password. I choose a longer password and with a
[36:39.550 --> 36:41.910]  reasonable amount of entropy.
[36:41.910 --> 36:45.450]  The point I'm trying to make is there is a lot of crypto analysis
[36:45.450 --> 36:47.790]  done which points us
[36:47.790 --> 36:50.720]  towards the theory that longer
[36:50.720 --> 36:53.720]  passwords are better than shorter ones
[36:53.720 --> 36:56.500]  with higher entropy. A password whose length is
[36:56.500 --> 36:59.660]  25 or 30 characters is far better
[36:59.660 --> 37:02.340]  than a password which is of
[37:02.340 --> 37:05.660]  8 or 10 characters standard length with
[37:05.660 --> 37:08.820]  two special characters, one uppercase and two
[37:08.820 --> 37:11.920]  digits and those kind of things.
[37:12.500 --> 37:13.840]  So do that.
[37:13.840 --> 37:17.520]  The website we looked at about
[37:17.520 --> 37:20.840]  the data breached information, they have
[37:20.980 --> 37:23.840]  a nice API exposed by
[37:23.840 --> 37:26.800]  again Troy Hunt. It would be great to
[37:26.800 --> 37:29.620]  use that API in your websites or even
[37:29.620 --> 37:32.260]  password managers can start using that where
[37:32.260 --> 37:35.680]  if a password which has already been seen
[37:35.680 --> 37:38.980]  in a breach is being
[37:38.980 --> 37:41.180]  used, then they would be flagged and
[37:41.880 --> 37:43.940]  that would go a long way.
[37:43.940 --> 37:47.180]  Finally, in conclusion, please
[37:47.180 --> 37:50.080]  embrace adaptive key derivation
[37:50.080 --> 37:53.020]  functions. Use memory hard functions based
[37:53.020 --> 37:56.020]  on your choice and comfort level with the amount
[37:56.020 --> 37:58.960]  of crypto analysis done. Please don't
[37:58.960 --> 38:01.540]  do plain text or hashing or
[38:01.540 --> 38:04.940]  your own DIY designs. Those are all silly things in
[38:04.940 --> 38:08.060]  today's time. Consider upgrading your
[38:08.060 --> 38:10.900]  work factors based on the resources
[38:10.900 --> 38:13.840]  cost out in the market.
[38:17.420 --> 38:19.300]  Consider having unique
[38:19.300 --> 38:20.800]  work factors for
[38:22.100 --> 38:24.140]  information you are trying to save or
[38:24.140 --> 38:27.860]  for each different user as well.
[38:28.200 --> 38:30.440]  Password hashing suggestion longer is better
[38:30.440 --> 38:33.520]  than shorter with a higher entropy. Keep
[38:33.520 --> 38:36.440]  using passwords. Keep auditing your passwords for
[38:36.440 --> 38:39.820]  its existence in any breaches.
[38:40.540 --> 38:43.260]  And finally, I would like to conclude
[38:43.860 --> 38:46.480]  a huge thanks for giving me this opportunity
[38:46.480 --> 38:49.740]  to share my thoughts. My DMs
[38:49.740 --> 38:53.340]  are always open for any interesting conversations.
[38:53.340 --> 38:55.580]  I blog a lot about these things
[38:55.580 --> 38:58.880]  in much more detail than what a
[38:58.880 --> 39:02.280]  45-minute slot is going to ever allow me to.
[39:02.280 --> 39:04.460]  And finally, you will find all
[39:04.460 --> 39:07.300]  these algorithms implemented in Java as well
[39:07.300 --> 39:12.220]  as the tuning tool on my GitHub repo.
[39:12.340 --> 39:13.580]  Thank you.
[39:41.370 --> 39:43.290]  That was talk how to store sensitive
[39:43.290 --> 39:46.150]  information in 2020 and do's, don'ts,
[39:46.150 --> 39:48.490]  and how-tos of cryptobuilding blocks used in Java.
[39:48.490 --> 39:51.730]  Thank you again to Mansi. We have them right here
[39:51.730 --> 39:54.270]  for a live Q&A, so please put your questions in the
[39:54.270 --> 39:56.830]  Discord CPB Q&A channel.
[39:58.750 --> 40:01.390]  So just to start off, what's the reason
[40:01.390 --> 40:05.670]  for high memory usage is a requirement for KDFs?
[40:05.670 --> 40:07.790]  This is from Discord, by the way. I understand
[40:07.790 --> 40:10.610]  it helps make implementing ASICs harder, but I don't
[40:10.610 --> 40:11.870]  understand why.
[40:12.810 --> 40:16.490]  Sure. So ASICs, in my opinion, really started
[40:16.490 --> 40:19.590]  coming into existence because of the
[40:19.590 --> 40:22.150]  underlying bit mining philosophy
[40:22.150 --> 40:25.830]  is increasing the computation over and over again
[40:25.830 --> 40:28.410]  much, much faster. The hardware
[40:28.410 --> 40:31.790]  is still expensive and
[40:31.790 --> 40:35.150]  throwing memory at it will just make it much more
[40:35.150 --> 40:38.990]  expensive for a widespread adoption,
[40:38.990 --> 40:41.390]  actually. And that would
[40:41.390 --> 40:44.710]  ultimately add to the cost of cracking passwords
[40:44.710 --> 40:46.430]  offline, in my opinion.
[40:47.090 --> 40:50.010]  Again, we have not seen the future. We don't know what quantum
[40:50.010 --> 40:53.230]  is going to get us, but for foreseeable future,
[40:53.230 --> 40:55.970]  for whatever the current theoretical crypto
[40:55.970 --> 40:59.090]  analysis says, that's what it is.
[40:59.810 --> 41:02.570]  Awesome. We have another question.
[41:03.110 --> 41:05.370]  When memory is the bottleneck, is there still
[41:05.370 --> 41:08.390]  an advantage to using ASICs, or does it revert to only
[41:08.390 --> 41:11.330]  negligible gain over general-purpose CPUs?
[41:12.190 --> 41:13.730]  Well, we still have a huge
[41:13.730 --> 41:17.210]  iteration factor, right? So it's a combination of both.
[41:17.210 --> 41:20.010]  So a general-purpose CPU can't be
[41:20.010 --> 41:23.130]  that highly paralyzed with the amount of memory required
[41:23.130 --> 41:25.630]  for each thread. So, yeah.
[41:27.290 --> 41:29.190]  Cool. Another question that we have
[41:29.190 --> 41:32.130]  is, does a salt need to be just
[41:32.130 --> 41:35.010]  unique instead of being that big to avoid having two
[41:35.010 --> 41:38.170]  passwords hash the same thing? This is in relation to 64
[41:38.170 --> 41:40.410]  bits versus, say, 128 bits.
[41:40.730 --> 41:44.290]  Sure. Well, 64 bits is, like, the pure
[41:44.290 --> 41:47.190]  minimum requirement anyways from a standard
[41:47.190 --> 41:50.170]  which was written, like, at least half a dozen... I mean, at least
[41:50.170 --> 41:53.550]  five or six years ago. I don't remember right now.
[41:53.550 --> 41:56.090]  Actually, even more. This is PBKDF, probably.
[41:56.750 --> 41:59.390]  So, I mean, a little bit more salt.
[41:59.390 --> 42:02.190]  128 is not unacceptable. It's not going to add
[42:02.190 --> 42:05.390]  hugely to the processing power. Yeah, it's
[42:05.390 --> 42:08.170]  not a hard requirement. Even CSPRNG
[42:08.710 --> 42:11.130]  is, like, maybe a little too much crypto
[42:11.130 --> 42:13.950]  kind of a situation. So, yeah.
[42:14.050 --> 42:17.530]  It's a little bit on the higher side, but that's okay, in my opinion.
[42:17.530 --> 42:20.790]  It does not add to the computation at all.
[42:21.710 --> 42:24.630]  Awesome. Another question we have,
[42:24.630 --> 42:26.770]  and there were some side parts to this, of course, in chat
[42:26.770 --> 42:29.730]  if you want to go further into them, as I saw. But the
[42:29.730 --> 42:32.790]  question was, any thoughts on using Libsodium, the fork,
[42:32.790 --> 42:35.750]  instead of NACL? I have had great experiences
[42:35.750 --> 42:38.630]  with it. Yeah, Libsodium
[42:39.070 --> 42:41.930]  is great. It is being adopted much more widely, as
[42:41.930 --> 42:44.590]  I stood corrected. NACL is last
[42:44.590 --> 42:48.070]  made at least five, six years ago.
[42:48.470 --> 42:50.950]  I have a slight preference for NACL just because
[42:50.950 --> 42:53.770]  I like the documentation and the names of the APIs
[42:53.770 --> 42:56.770]  and the ease with which they are taking
[42:56.770 --> 43:00.050]  away all the more options
[43:00.050 --> 43:03.030]  you give, the higher the chances of it actually going
[43:03.030 --> 43:06.010]  wrong. So that way I feel NACL is slightly
[43:06.010 --> 43:09.070]  better. But Libsodium is absolutely great anyways.
[43:09.490 --> 43:11.550]  So that's my personal preference.
[43:12.510 --> 43:15.570]  Awesome. Does anyone have any other
[43:15.570 --> 43:18.470]  Q&A questions? Please drop them into the Discord
[43:18.470 --> 43:19.330]  now.
[43:21.930 --> 43:24.670]  This is in the CPB Talk Q&A text
[43:24.670 --> 43:35.850]  channel. And we just wait
[43:35.850 --> 43:38.890]  for a little bit. Also, thank you so much for your talks. These are
[43:38.890 --> 43:41.750]  really lovely, and I'm really excited for the next one to be replayed
[43:42.270 --> 43:43.130]  again soon.
[43:43.130 --> 43:47.510]  Thank you. I'm having a lot of fun
[43:47.510 --> 43:49.870]  watching me talk for two hours.
[43:49.870 --> 43:53.770]  Oh, man, that's a lot of anxiety
[43:54.450 --> 43:55.950]  for me at least.
[43:56.690 --> 43:57.890]  That's fun.
[44:01.080 --> 44:04.700]  I think we have someone typing. Oh, no, just people
[44:04.700 --> 44:07.080]  love your talk. Thank you.
[44:07.200 --> 44:10.180]  Let's see if this last person has a question.
[44:17.030 --> 44:19.190]  All right. Thank you again,
[44:19.190 --> 44:22.030]  Mansi, for all your time with us. We hope you
[44:22.030 --> 44:24.910]  take care. Enjoy the rest of Crypto Village and DEF CON.
[44:25.150 --> 44:27.930]  We will love to see you in our Discord
[44:27.930 --> 44:28.830]  soon.
[44:29.990 --> 44:34.270]  Please stay safe, everyone. Please stay safe. Thank you. Bye-bye.
